156 research outputs found
RNeXML: a package for reading and writing richly annotated phylogenetic, character, and trait data in R
NeXML is a powerful and extensible exchange standard recently proposed to
better meet the expanding needs for phylogenetic data and metadata sharing.
Here we present the RNeXML package, which provides users of the R programming
language with easy-to-use tools for reading and writing NeXML documents,
including rich metadata, in a way that interfaces seamlessly with the extensive
library of phylogenetic tools already available in the R ecosystem
Estimating the relative order of speciation or coalescence events on a given phylogeny
The reconstruction of large phylogenetic trees from data that violates
clocklike evolution (or as a supertree constructed from any m input trees)
raises a difficult question for biologists - how can one assign relative dates
to the vertices of the tree? In this paper we investigate this problem,
assuming a uniform distribution on the order of the inner vertices of the tree
(which includes, but is more general than, the popular Yule distribution on
trees). We derive fast algorithms for computing the probability that (i) any
given vertex in the tree was the j--th speciation event (for each j), and (ii)
any one given vertex is earlier in the tree than a second given vertex. We show
how the first algorithm can be used to calculate the expected length of any
given interior edge in any given tree that has been generated under either a
constant-rate speciation model, or the coalescent model
EvoIO: Community-driven standards for sustainable interoperability
Interoperability is the property that allows systems to work together independent of who created them, or how or for what purpose they were implemented. It is crucial for aggregating data from different online resources and for integrating different kinds of data. Interoperability is based on effective standards that become and remain broadly adopted. We argue that to develop and apply such standards for evolutionary and biodiversity data sustainably, we need a community-driven, open, and participatory approach. With the goal to build such an approach, the EvoIO collaboration emerged in 2009 from several NESCent-sponsored activities. EvoIO aims to be a nucleating center for developing, applying and disseminating interoperability technology that connects and coordinates between stakeholders, developers, and standards bodies.

Members of the EvoIO group have harnessed a variety of collaborative events to successfully build an initial stack of interoperability technologies that is owned by the community and open to participation. The stack addresses syntax, semantics, and programmable services, and at present includes the following components: NeXML (http://nexml.org), a NEXUS-inspired XML format that is validatable yet extensible; CDAO (http://www.evolutionaryontology.org), an ontology of comparative data analysis formalizing the semantics of evolutionary data and metadata; and PhyloWS (http://evoinfo.nescent.org/PhyloWS), a web- services interface standard for querying, retrieving, and referencing phylogenetic data on the web. Beyond demonstration prototypes, reference implementations of EvoIO stack technologies are starting to appear in production use. 

Aside from producing such information artefacts, EvoIO devotes much of its energy to applying principles of communication and organization that result in open and inclusive processes of community science. One of the key tools employed by EvoIO is the hackathon event format. Hackathons are highly collaborative, hands-on working meetings that catalyze practical innovation, train researchers, and foster cohesion as well as a sense of shared ownership in the results. In summary, we find that broad community participation, buy-in, and ownership are critical for developing interoperability in a sustainable fashion, and there are approaches and tools that can foster these effectively
Publishing re-usable phylogenetic trees, in theory and practice
Sharing and re-use of data are essential to the progressive and self-correcting nature of science. In recognition of this principle, journals and funding agencies have adopted policies to encourage sharing of information ('data'), including empirical data as well as computed inferences such as phylogenetic trees. 
Here we summarize an ongoing analysis of 1) current practices for sharing phylogenetic trees and associated data; 2) current barriers to effective sharing and reuse of such data; and 3) prospects for reducing these barriers to promote more widespread sharing and re-use. Currently, the technical infrastructure is available to support (with some limitations) rudimentary archiving in conjunction with manuscript publication. Yet, most published trees are not archived, and there is no community standard governing the recommended format or content to ensure a re-usable phylogenetic record. Without a shift in emphasis toward re-usability, along with technology and standards to support such a shift, the value of trees (whether disseminated via public archives, or by other means) will be limited. Interviews with actual or potential secondary consumers of phylogenetic results suggest that there is a considerable market for re-use, but that most attempts end in disappointment. Phylogenetic results available via author requests, journal web sites, archival repositories and project web sites rarely include the critical information that secondary consumers seek, such as unique identifiers for biological sources (including species sources and accession numbers), indicators of quality, and documentation of the analytical methods used to obtain the results.
Based on the analysis presented here, we suggest that enabling effective re-use entails a commitment by the research community to several changes from current practice: 1) using globally unique identifiers (GUIDs) to reference informational and material entities; 2) developing and using technology for documenting and exchanging the metadata that facilitate re-use; and 3) supporting development and use of a minimal reporting standard that indicates what data and metadata are considered essential for a re-useable phylogenetic record. We suggest that re-use may be catalyzed most rapidly by identifying and targeting (with appropriate technology) the most promising circumstances for re-use. These might include the extraction of sub-trees from large trees (for use in reconciliation, classification, and comparative analysis); the re-use of seed alignments, sub-alignments and homologized characters; the linking of phylogenies to geographic information (for use in ecology, phylogeography and biogeography); and the construction of supertrees and supermatrices
The Netherlands Biodiversity Data Services and the R package nbaR: Automated workflows for biodiversity data analysis
The value of data present in natural history collections for research in biodiversity, ecology and evolution
cannot be overstated. Naturalis Biodiversity Center of the Netherlands, home to one of the largest natural
history collections in the world, launched a large-scale digitisation project resulting in the registration of more
than 38 million specimen objects, many of them annotated with descriptive metadata, such as geographic
coordinates or multimedia content. Other resources hosted at Naturalis include species occurrence records
and comprehensive taxonomic checklists, such as the Catalogue of Life. As our institution strongly believes
in the Open Science paradigm, we seek to make our data available to the global biodiversity research
community, enhancing data analysis workflows, as for example (i) the modelling of present, past and future
species distributions using specimen occurrence data, (ii) time calibration of (molecular) phylogenies using
dated specimen occurrences, (iii) taxonomic name resolution or (iv) image data mining. To this end, we
developed the Netherlands Biodiversity Data services [1], providing centralized access to biodiversity data
via state of the art, open access interfaces and a mechanism to assign persistent identifiers to all records.
Data are retrieved from heterogeneous sources and harmonized into a document store that complies with
international data standards such as ABCD (Access to Biological Collection Data [2]). Employing the
Elasticsearch engine, our infrastructure features complex query options, near real-time queries, and scaling
possibilities to secure foreseen data growth. Focusing on availability and accessibility, the services were
designed as a versatile, low-level REST API to allow the use of our data in a broad variety of applications
and services. For programmatic access to our data services, we developed client libraries for several
programming languages. Here we present the R package ‘nbaR’ [3], a client especially targeted to an
audience of biodiversity researchers. The R programming language has found wide acceptance in this field
over the past years and our package facilitates convenient means to connect our data resources to existing
tools for statistical modelling and analysis. The abstraction layer introduced by the client lets the user
formulate even complex queries in a convenient manner, thereby lowering the access threshold to our data
services. We will demonstrate the potential and benefits of services and R client by integrating nbaR with
state-of-the art packages for species distribution modelling and time calibration of phylogenetic trees into a
single analysis workflow.
1. Netherlands Biodiversity Data services – User documentation. http://docs.biodiversitydata.nl (accessed 17 May
2018).
2. Access to Biological Collections Data task group. 2007. Access to Biological Collection Data (ABCD), Version 2.06.
Biodiversity Information Standards (TDWG) http://www.tdwg.org/standards/115 (accessed 17 May 2018).
3. nbaR GitHub repository. https://github.com/naturalis/ nbaR (accessed 17 May 2018)
Unsupervised Machine Learning to Classify the Confinement of Waves in Periodic Superstructures
We employ unsupervised machine learning to enhance the accuracy of our
recently presented scaling method for wave confinement analysis [1]. We employ
the standard k-means++ algorithm as well as our own model-based algorithm. We
investigate cluster validity indices as a means to find the correct number of
confinement dimensionalities to be used as an input to the clustering
algorithms. Subsequently, we analyze the performance of the two clustering
algorithms when compared to the direct application of the scaling method
without clustering. We find that the clustering approach provides more
physically meaningful results, but may struggle with identifying the correct
set of confinement dimensionalities. We conclude that the most accurate outcome
is obtained by first applying the direct scaling to find the correct set of
confinement dimensionalities and subsequently employing clustering to refine
the results. Moreover, our model-based algorithm outperforms the standard
k-means++ clustering.Comment: 24 pages, 11 figure
Evolution of embryonic developmental period in the marine bird families Alcidae and Spheniscidae: roles for nutrition and predation?
Background: Nutrition and predation have been considered two primary agents of selection important in theevolution of avian life history traits. The relative importance of these natural selective forces in the evolution of avianembryonic developmental period (EDP) remain poorly resolved, perhaps in part because research has tended to focuson a single, high taxonomic-level group of birds: Order Passeriformes. The marine bird families Alcidae (auks) andSpheniscidae (penguins) exhibit marked variation in EDP, as well as behavioural and ecological traits ultimately linkedto EDP. Therefore, auks and penguins provide a unique opportunity to assess the natural selective basis of variation in akey life-history trait at a low taxonomic-level. We used phylogenetic comparative methods to investigate the relativeimportance of behavioural and ecological factors related to nutrition and predation in the evolution of avian EDP.Results: Three behavioural and ecological variables related to nutrition and predation risk (i.e., clutch size, activitypattern, and nesting habits) were significant predictors of residual variation in auk and penguin EDP based on modelspredicting EDP from egg mass. Species with larger clutch sizes, diurnal activity patterns, and open nests hadsignificantly shorter EDPs. Further, EDP was found to be longer among birds which forage in distant offshore waters,relative to those that foraged in near shore waters, in line with our predictions, but not significantly so.Conclusion: Current debate has emphasized predation as the primary agent of selection driving avian life historydiversification. Our results suggest that both nutrition and predation have been important selective forces in theevolution of auk and penguin EDP, and highlight the importance of considering these questions at lower taxonomicscales. We suggest that further comparative studies on lower taxonomic-level groups will continue to constructivelyinform the debate on evolutionary determinants of avian EDP, as well as other life history parameters
Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis
BACKGROUND Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use. FINDINGS Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree. CONCLUSIONS The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating which kinds of data and metadata are most important for a re-useable phylogenetic record
Fatal Hemothorax Caused by Pseudomesotheliomatous Carcinoma of the Lung
We present a case of a poorly differentiated pseudomesotheliomatous carcinoma originating in the lung, which was manifested with the distinctly rare complication of massive true hemothorax and persistent blood loss that proved rapidly fatal in spite of surgery. Pseudomesotheliomatous carcinoma of the lung and neoplasia-associated hemothorax are reviewed and discussed
- …